Two-armed bandits with a goal, I. One arm known

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Best Arm Identification in Multi-Armed Bandits

We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy and a new algorithm based on successive rejects. We show that these algorithms are essentially optimal since their regre...

متن کامل

Best arm identification in multi-armed bandits with delayed feedback

We propose a generalization of the best arm identification problem in stochastic multiarmed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework ...

متن کامل

On ergodic two-armed bandits

A device has two arms with unknown deterministic payoffs, and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e. with greatest Cesaro limit) with probability o...

متن کامل

Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls

We introduce a new multi-armed bandit (MAB) problem in which arms must be sampled in batches, rather than one at a time. This is motivated by applications in social media monitoring and biological experimentation where such batch constraints naturally arise. This paper develops and analyzes algorithms for batch MABs and top arm identification, for both fixed confidence and fixed budget settings...

متن کامل

Probability ON ERGODIC TWO - ARMED BANDITS

A device has two arms with unknown deterministic payoffs, and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e. with greatest Cesaro limit) with probability o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Advances in Applied Probability

سال: 1980

ISSN: 0001-8678,1475-6064

DOI: 10.1017/s0001867800035497